-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(low-code cdk): add transformation to dynamic schema loader #176
feat(low-code cdk): add transformation to dynamic schema loader #176
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces a new property named Changes
Possibly related PRs
Suggested reviewers
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
2517-2526
: LGTM! Consider adding clarifying documentation?The implementation looks good and aligns well with the PR objectives. The new
transformations
property follows the same pattern as stream-level transformations.What do you think about adding a description that clarifies when to use record selector transformations vs stream transformations? This could help users choose the appropriate level for their transformations. Something like:
transformations: title: Transformations - description: A list of transformations to be applied to each output record. + description: A list of transformations to be applied to each output record during the record selection phase. Use these transformations when you need to modify records before they are processed by stream-level transformations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(9 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(4 hunks)
🔇 Additional comments (3)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1510-1516
: LGTM! Nice addition of the transformations field to RecordSelector.
The optional transformations field allows for flexible record manipulation through various transformation types. The implementation maintains backward compatibility since it's optional.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
1926-1931
: LGTM! Clean implementation of transformation handling.
The implementation correctly creates transformation components from the models and appends them to the transformations list. The code is straightforward and follows the factory pattern consistently.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1)
182-185
: LGTM! Great test coverage for the transformation functionality.
The tests thoroughly verify the transformation handling in both the schema parsing and component creation. The coverage includes both AddFields and RemoveFields transformations, ensuring the feature works as expected.
Also applies to: 302-309, 1315-1318, 1340-1340
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1502-1508
: LGTM! Consider adding an example to the field description?The implementation looks good and aligns well with the PR objectives to allow transformations in the record selector. The type definition and documentation are clear.
Would you consider adding an example to the field description to make it more user-friendly? Something like:
examples=[{ "transformations": [ {"type": "AddFields", "fields": [{"path": ["extra_field"], "value": "{{ record['some_field'] }}"}]}, {"type": "RemoveFields", "field_pointers": [["sensitive_data"]]} ] }]wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
182-185
: LGTM! Consider enhancing test coverage?The addition of transformations to the record selector looks good. Would you consider adding a test case with multiple field pointers to ensure the RemoveFields transformation handles multiple fields correctly? wdyt?
2250-2268
: LGTM! Consider adding edge cases?The test case for AddFields transformation with string value type is well-structured. Would you consider adding test cases for edge cases like empty strings or special characters? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
(10 hunks)
🔇 Additional comments (2)
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
2198-2201
: LGTM! Clear and helpful comment
The comment effectively explains why transformations appear twice in the test results. This helps future maintainers understand the expected behavior.
2216-2234
: LGTM! Well-structured test case
The test case for AddFields transformation without value type is thorough and clearly structured. The expected results are well-defined and the comment explains the duplication clearly.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
Outdated
Show resolved
Hide resolved
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
1650-1656
: Consider adding validation for the name fieldThe name field is marked as optional with a default empty string. Should we add some validation to ensure the name follows a specific pattern when provided? For example, ensuring it's a valid Python identifier? wdyt?
name: Optional[str] = Field( - "", description="The stream name.", example=["Users"], title="Name" + "", + description="The stream name.", + example=["Users"], + title="Name", + regex="^[a-zA-Z_][a-zA-Z0-9_]*$" )Also applies to: 1657-1658
1925-1929
: Consider adding type validation for partition router listsThe partition router can now be a single router or a list of routers. Should we add validation to ensure the list isn't empty when provided? wdyt?
partition_router: Optional[ Union[ CustomPartitionRouter, ListPartitionRouter, SubstreamPartitionRouter, List[ Union[ CustomPartitionRouter, ListPartitionRouter, SubstreamPartitionRouter ] ], ] ] = Field( [], description="PartitionRouter component that describes how to partition the stream, enabling incremental syncs and checkpointing.", title="Partition Router", + min_items=1 )
Also applies to: 2003-2007
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(10 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🔇 Additional comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1770-1780
: LGTM! Schema definition matches implementation.
The transformations field schema definition in the YAML file correctly matches the Python implementation, allowing the same transformation types and maintaining consistency.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (4)
531-533
: OAuth configuration enhancements look good!
The additions to OAuth configuration are well-documented with clear examples. The new fields enable better OAuth flow customization and user input handling.
Also applies to: 827-829, 898-912, 914-919
1848-1862
: LGTM: Transformations field added to DynamicSchemaLoader
The transformations field matches the structure used in DeclarativeStream, maintaining consistency across the codebase.
1971-1973
: LGTM: Optional download_extractor field
The download_extractor field is properly marked as optional with clear typing.
2071-2076
: LGTM: Components resolver field in DynamicDeclarativeStream
The components_resolver field is well-typed with a clear description of its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (2)
107-107
: Could you add a docstring?
This adds the list of transformations with a factory default. Perhaps consider adding a short docstring for clarity, wdyt?
133-134
: Check if None is more explicit.
You're calling_transform(properties, {})
. If it's not used, passing None might be clearer. Would you consider that, wdyt?unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)
67-71
: Additional test coverage for schema transformations.
You introduced a new "KeysToSnakeCase" transformation. Maybe add another transformation variety to ensure robust coverage, wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
597-601
: create_keys_to_snake_transformation method.
This direct return is simple. Would it help if we passed parameters to the transformation, or is this minimal signature enough, wdyt?airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
1770-1780
: LGTM! The schema_transformations implementation looks clean and well-structured.The implementation follows the existing patterns in the codebase and aligns perfectly with the PR objectives. One small suggestion: would you consider adding a default empty array value for consistency with other similar fields in the schema? wdyt?
schema_transformations: title: Schema Transformations description: A list of transformations to be applied to the schema. type: array + default: [] items: anyOf: - "$ref": "#/definitions/AddFields" - "$ref": "#/definitions/CustomTransformation" - "$ref": "#/definitions/RemoveFields" - "$ref": "#/definitions/KeysToLower" - "$ref": "#/definitions/KeysToSnakeCase"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(6 hunks)airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py
(4 hunks)unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
(4 hunks)
🔇 Additional comments (13)
airbyte_cdk/sources/declarative/schema/dynamic_schema_loader.py (5)
7-7
: Imports look good.
No issues found here, wdyt?
16-16
: Direct import of RecordTransformation recognized.
Looks straightforward and helps ensure typed transformations. wdyt?
18-18
: Added dependencies for Config, StreamSlice, and StreamState.
Including these type hints keeps the code more expressive. wdyt?
138-138
: Schema snippet alignment.
The “properties” key is updated to the transformed schema. This is consistent with standard JSON Schema usage, wdyt?
141-153
: Order of transformations is enforced implicitly.
All transformations are applied in a loop, ensuring the final state is cumulative. If the order matters, we may want more explicit documentation. Would that add clarity, wdyt?
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (3)
238-238
: Renamed property to 'first_name'.
This aligns with the transformation logic. Changes look correct, wdyt?
253-254
: Sample data matches new property names.
This looks consistent with code changes. wdyt?
265-267
: Confirm naming logic for "FirstName" → "first_name".
The test properly verifies the snake-case transformation. Looks good, wdyt?
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1834-1848
: New field for schema transformations.
This field makes transformations more flexible. Seems well-defined. wdyt?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)
239-241
: New import for KeysToSnakeCaseModel.
Clear and consistent with the rest of the transformation imports. wdyt?
481-481
: Mapping KeysToSnakeCase in the factory.
This integration ensures your transformation can be constructed correctly. wdyt?
1653-1659
: schema_transformations list building.
You gather transformations into a list before injecting them. This is clean and readable, wdyt?
1674-1674
: Passing the transformations into DynamicSchemaLoader.
This final assignment is straightforward and consistent. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)
248-250
: How about making the transformation flow more explicit in the test?The test correctly validates the transformation pipeline, but we could make it more self-documenting. What do you think about:
Adding comments to explain the transformation flow:
- PascalCase in schema response (
FirstName
)- Gets transformed to snake_case (
first_name
)- Static field gets added (
static_field
)Maybe split the test into smaller, focused test cases for each transformation? This would make it easier to maintain and debug, wdyt? 🤔
Also applies to: 264-265, 276-278
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py
(4 hunks)
🔇 Additional comments (1)
unit_tests/sources/declarative/schema/test_dynamic_schema_loader.py (1)
67-81
: Consider adding more test cases for schema transformations?
The new schema transformations look good! However, we might want to strengthen our test coverage. What do you think about adding test cases for:
- Failed transformations
- Empty transformation lists
- Multiple AddFields transformations
- Invalid field definitions
This would help ensure robustness, wdyt? 🤔
I liked the idea of having it in the record selector as we would resolve this in other structures. However, if we want transformations at the dynamic schema_loader level, should we also file an issue to have transformations for components_resolver? cc @maxi297 |
@aldogonzalez8 Do you have cases when we need transform records in |
Yes, I'm already using it here. I was considering moving away from my custom transformation to use the flatten transformation you created, but still, need transformations to be available for |
Could you help me undsersted why do you need this transformation at the |
If I understand correctly, with this transformation, you extract certain fields to the properties level and then reference them from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Regarding @aldogonzalez8 's comment on the transformation for component resolver, it will be needed for source-airtable as well in order to put parent stream information on the records so that we can get those values available when we resolve. @darynaishchenko |
Yes, I could transform data on the extractor that receives such data, but I liked how this organically worked with transformations. This can happen in many ways, I guess. Still I feel that in the future in any place where we retrieve records/data it would be nice the chance to have transformations for such records/data. |
@maxi297 For airtable, I don't think we need it at all, as we will have access to parent fields in the components_mapping once we add stream_slicing to the http resolver. - airbyte-python-cdk/airbyte_cdk/sources/declarative/resolvers/http_components_resolver.py Line 97 in ec9e480
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROVED
@maxi297 @aldogonzalez8 @darynaishchenko For now, the migration to 'manifest only' (migration without any custom components) for source google sheets or source airtable is not hindered by the absence of transformations in the components resolver. I prefer to leave the components resolver without transformation, as once the transformation is moved to the record selector, we will achieve this for free. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
unit_tests/sources/declarative/test_manifest_declarative_source.py (1)
1370-1453
: Consider adding more test cases for edge cases?The test case effectively verifies basic flattening functionality. Would you consider adding tests for edge cases like:
- Empty nested objects
- Deeply nested fields (>2 levels)
- Handling of array fields
- Name collisions after flattening
This would help ensure robust handling of various data structures, wdyt?
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
1771-1782
: Consider enhancing the schema transformation documentation?The schema_transformations definition looks good, but would you consider adding:
- More detailed description of each transformation's behavior
- Examples of common use cases
- Notes on transformation order/precedence
- Potential impacts on schema validation
This would help users better understand how to effectively use these transformations, wdyt?
1238-1238
: Consider adding configuration options to FlattenFields?The current definition is simple and functional. Would you consider adding optional configuration parameters like:
max_depth
to control flattening depthseparator
to customize the field name separatorinclude_paths
/exclude_paths
for selective flatteningpreserve_arrays
flag to control array handlingThis would provide more flexibility in how fields are flattened, wdyt?
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1840-1855
: LGTM! Schema transformations field looks goodThe schema_transformations field is well-structured and properly typed. The transformations list matches the record-level transformations which provides consistency.
Hey, what do you think about adding a brief example in the field description to show how these transformations can be chained together? Something like:
schema_transformations: - type: KeysToSnakeCase - type: FlattenFieldswdyt? 🤔
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py
(2 hunks)unit_tests/sources/declarative/test_manifest_declarative_source.py
(1 hunks)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
1674-1674
: LGTM! Addition of FlattenFields transformation
The addition of FlattenFields to the transformations list is consistent with the existing transformation types and enhances the schema capabilities.
What
During dynamic schema generation, we need to transform keys or add fields to enhance the schema's compatibility and functionality.
How
A schema transformations component has been integrated into the dynamic schema loader. This component is responsible for managing a list of transformations, enabling us to easily modify schema keys or insert additional fields as needed.
Summary by CodeRabbit
New Features
schema_transformations
to enhance schema manipulation capabilities, allowing for various transformations such as adding/removing fields and key formatting.FlattenFields
transformation type in stream configuration.Bug Fixes
Tests
FlattenFields
transformation and adjusted existing tests for updated schema definitions.